Hierarchy in Web Page Similarity Link Analysis

نویسنده

  • Allan M. Schiffman
چکیده

Rather than using traditional text analysis to discover Web pages similar to a given page, we investigate applying link analysis. Since web pages exist in a link-rich environment, that has the potential to relate pages by any property imaginable — since links are not restricted to intrinsic properties of the page text or metadata. In particular, while Web page similarity link analysis has been explored, prior work has deliberately ignored the explicitly hierarchical host & pathname structure within URLs. To exploit this property, we generalize Kleinberg’s well-known “hubs and authorities” HITS algorithm; adapt this algorithm to accommodate hierarchical link structure; test some sample web queries; and argue that the results are potentially superior and that the algorithm itself is better motivated.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparison Framework of Similarity Metrics Used for Web Access Log Analysis

In this paper, different types of web session similarity metrics are compared and combined for better web session clustering. Syntactic and co-occurrence information are used for similarity calculation. Syntactic information on a web page includes the place of the page in the directory hierarchy. Co-occurrence information is the amount of the occurrences of two web pages in the same sessions. V...

متن کامل

An Iterative Link-based Method for Parallel Web Page Mining

Identifying parallel web pages from bilingual web sites is a crucial step of bilingual resource construction for crosslingual information processing. In this paper, we propose a link-based approach to distinguish parallel web pages from bilingual web sites. Compared with the existing methods, which only employ the internal translation similarity (such as content-based similarity and page struct...

متن کامل

Shear-Flexural Interaction in Analysis of Reduced Web Section Beams using VM Link Element

Reduced web section beams in shear-yielding moment-resistant steel frames are used for energy dissipating of earthquakes. The finite element analysis indicates that failure mode of these beams are governed by the combination of shear force and flexural moment. Therefore the analysis of frames with reduced web section beams needs consideration of shear-flexural interaction in those sections. In ...

متن کامل

MFCRank: A Web Ranking Algorithm Based on Correlation of Multiple Features

This paper presents a new ranking algorithm MFCRank for topic-specific Web search systems. The basic idea is to correlate two types of similarity information into a unified link analysis model so that the rich content and link features in Web collections can be exploited efficiently to improve the ranking performance. First, a new surfer model JBC is proposed, under which the topic similarity i...

متن کامل

Hierarchical Web-Page Clustering via In-Page and Cross-Page Link Structures

Despite of the wide diversity of web-pages, web-pages residing in a particular organization, in most cases, are organized with semantically hierarchic structures. For example, the website of a computer science department contains pages about its people, courses and research, among which pages of people are categorized into faculty, staff and students, and pages of research diversify into differ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006